78 Examples, Massive Gains: LIMI Turns Tiny Datasets into Powerful Software Agents
'LIMI uses 78 curated, tool-grounded trajectories to fine-tune GLM models, hitting 73.5% on AgencyBench and outperforming large-sample baselines by a wide margin.'
Records found: 6
'LIMI uses 78 curated, tool-grounded trajectories to fine-tune GLM models, hitting 73.5% on AgencyBench and outperforming large-sample baselines by a wide margin.'
'Apriel-1.5-15B-Thinker is a 15B open-weights multimodal reasoning model that achieves a 52 AAI score and fits on a single GPU, offering reproducible training artifacts and competitive benchmark results at low cost.'
'MIT shows that on-policy reinforcement learning preserves prior capabilities better than supervised fine-tuning by minimizing forward KL divergence between the base and fine-tuned models.'
Prefix-RFT blends supervised and reinforcement fine-tuning by using partial demonstration prefixes to guide exploration, achieving stronger and more stable performance on math reasoning benchmarks than SFT, RFT, and hybrid baselines.
OpenThoughts introduces a scalable supervised fine-tuning pipeline that significantly enhances reasoning datasets and models, achieving state-of-the-art performance in math, coding, and science domains.
Yandex has launched Alchemist, a compact supervised fine-tuning dataset that significantly improves text-to-image model quality by selecting high-impact image-text pairs using a novel model-guided approach.